Curieux.JY
  • Jung Yeon Lee
  • ✒️Note

On this page

  • Finding the Relationships between Origin and Destination Airports
  • Displaying the Chord Diagram
  • Applying Colors to the Chord Diagram
  • Confirming the Relationships in the Chord Diagram

👩‍💻Chord Graph

chord
visualization
code
HoloView를 이용하여 Chord Graph 그리기
Published

April 27, 2023

https://towardsdatascience.com/plotting-chord-diagrams-in-python-72fd71b3eef0

import pandas as pd

df = pd.read_csv('../flights.csv')
df
/home/jungyeon/anaconda3/envs/imi/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3020: DtypeWarning: Columns (7,8) have mixed types.Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)
YEAR MONTH DAY DAY_OF_WEEK AIRLINE FLIGHT_NUMBER TAIL_NUMBER ORIGIN_AIRPORT DESTINATION_AIRPORT SCHEDULED_DEPARTURE ... ARRIVAL_TIME ARRIVAL_DELAY DIVERTED CANCELLED CANCELLATION_REASON AIR_SYSTEM_DELAY SECURITY_DELAY AIRLINE_DELAY LATE_AIRCRAFT_DELAY WEATHER_DELAY
0 2015 1 1 4 AS 98 N407AS ANC SEA 5 ... 408.0 -22.0 0 0 NaN NaN NaN NaN NaN NaN
1 2015 1 1 4 AA 2336 N3KUAA LAX PBI 10 ... 741.0 -9.0 0 0 NaN NaN NaN NaN NaN NaN
2 2015 1 1 4 US 840 N171US SFO CLT 20 ... 811.0 5.0 0 0 NaN NaN NaN NaN NaN NaN
3 2015 1 1 4 AA 258 N3HYAA LAX MIA 20 ... 756.0 -9.0 0 0 NaN NaN NaN NaN NaN NaN
4 2015 1 1 4 AS 135 N527AS SEA ANC 25 ... 259.0 -21.0 0 0 NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5819074 2015 12 31 4 B6 688 N657JB LAX BOS 2359 ... 753.0 -26.0 0 0 NaN NaN NaN NaN NaN NaN
5819075 2015 12 31 4 B6 745 N828JB JFK PSE 2359 ... 430.0 -16.0 0 0 NaN NaN NaN NaN NaN NaN
5819076 2015 12 31 4 B6 1503 N913JB JFK SJU 2359 ... 432.0 -8.0 0 0 NaN NaN NaN NaN NaN NaN
5819077 2015 12 31 4 B6 333 N527JB MCO SJU 2359 ... 330.0 -10.0 0 0 NaN NaN NaN NaN NaN NaN
5819078 2015 12 31 4 B6 839 N534JB JFK BQN 2359 ... 442.0 2.0 0 0 NaN NaN NaN NaN NaN NaN

5819079 rows × 31 columns

  • ORIGIN_AIRPORT
  • DESTINATION_AIRPORT

Finding the Relationships between Origin and Destination Airports

df_between_airports = df.groupby(by=["ORIGIN_AIRPORT", "DESTINATION_AIRPORT"]).count()
df_between_airports
YEAR MONTH DAY DAY_OF_WEEK AIRLINE FLIGHT_NUMBER TAIL_NUMBER SCHEDULED_DEPARTURE DEPARTURE_TIME DEPARTURE_DELAY ... ARRIVAL_TIME ARRIVAL_DELAY DIVERTED CANCELLED CANCELLATION_REASON AIR_SYSTEM_DELAY SECURITY_DELAY AIRLINE_DELAY LATE_AIRCRAFT_DELAY WEATHER_DELAY
ORIGIN_AIRPORT DESTINATION_AIRPORT
10135 10397 83 83 83 83 83 83 83 83 83 83 ... 83 83 83 83 0 13 13 13 13 13
11433 71 71 71 71 71 71 71 71 71 71 ... 71 71 71 71 0 15 15 15 15 15
13930 72 72 72 72 72 72 72 72 72 72 ... 72 72 72 72 0 12 12 12 12 12
10136 11298 189 189 189 189 189 189 189 189 182 182 ... 182 181 189 189 7 21 21 21 21 21
10140 10397 86 86 86 86 86 86 86 86 86 86 ... 86 86 86 86 0 6 6 6 6 6
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
XNA SFO 52 52 52 52 52 52 52 52 51 51 ... 51 51 52 52 1 14 14 14 14 14
SLC 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 0 0 0 0 0 0
YAK CDV 331 331 331 331 331 331 331 331 328 328 ... 325 325 331 331 3 42 42 42 42 42
JNU 331 331 331 331 331 331 331 331 329 329 ... 325 325 331 331 2 29 29 29 29 29
YUM PHX 1877 1877 1877 1877 1877 1877 1877 1877 1854 1854 ... 1854 1854 1877 1877 23 200 200 200 200 200

12377 rows × 29 columns

df_between_airports = df_between_airports['YEAR'].rename('COUNT').reset_index() 
df_between_airports
ORIGIN_AIRPORT DESTINATION_AIRPORT COUNT
0 10135 10397 83
1 10135 11433 71
2 10135 13930 72
3 10136 11298 189
4 10140 10397 86
... ... ... ...
12372 XNA SFO 52
12373 XNA SLC 1
12374 YAK CDV 331
12375 YAK JNU 331
12376 YUM PHX 1877

12377 rows × 3 columns

df_between_airports = df_between_airports.query('ORIGIN_AIRPORT.str.len() <= 3 & DESTINATION_AIRPORT.str.len() <= 3')
df_between_airports
ORIGIN_AIRPORT DESTINATION_AIRPORT COUNT
7684 ABE ATL 898
7685 ABE DTW 711
7686 ABE ORD 665
7687 ABI DFW 2329
7688 ABQ ATL 801
... ... ... ...
12372 XNA SFO 52
12373 XNA SLC 1
12374 YAK CDV 331
12375 YAK JNU 331
12376 YUM PHX 1877

4693 rows × 3 columns

df_between_airports = df_between_airports.sort_values(by="COUNT",ascending=False)
df_between_airports
ORIGIN_AIRPORT DESTINATION_AIRPORT COUNT
11867 SFO LAX 13744
10165 LAX SFO 13457
9949 JFK LAX 12016
10130 LAX JFK 12015
10053 LAS LAX 9715
... ... ... ...
11726 SBN COS 1
10329 MCI AUS 1
8139 BOI EUG 1
9688 IAD TTN 1
10227 LGA EYW 1

4693 rows × 3 columns

top = 40
df_between_airports.head(top)['ORIGIN_AIRPORT'].unique()
array(['SFO', 'LAX', 'JFK', 'LAS', 'LGA', 'ORD', 'OGG', 'HNL', 'ATL',
       'MCO', 'DFW', 'SEA', 'BOS', 'DCA', 'FLL', 'PHX', 'DEN', 'TPA'],
      dtype=object)
df_between_airports.head(40)
ORIGIN_AIRPORT DESTINATION_AIRPORT COUNT
11867 SFO LAX 13744
10165 LAX SFO 13457
9949 JFK LAX 12016
10130 LAX JFK 12015
10053 LAS LAX 9715
10253 LGA ORD 9639
10132 LAX LAS 9594
11085 ORD LGA 9575
11864 SFO JFK 8440
9974 JFK SFO 8437
10925 OGG HNL 8313
9536 HNL OGG 8282
10149 LAX ORD 8256
7887 ATL LGA 8234
10204 LGA ATL 8215
7892 ATL MCO 8202
10372 MCO ATL 8202
11866 SFO LAS 7995
11082 ORD LAX 7941
10083 LAS SFO 7870
8989 DFW ORD 7870
11797 SEA LAX 7765
10164 LAX SEA 7732
8168 BOS DCA 7687
8693 DCA BOS 7686
11044 ORD DFW 7677
7854 ATL FLL 7419
9337 FLL ATL 7411
11342 PHX LAX 7380
11887 SFO ORD 7380
11138 ORD SFO 7378
10153 LAX PHX 7330
8851 DEN PHX 7211
11322 PHX DEN 7204
10208 LGA BOS 7100
8182 BOS LGA 7096
12220 TPA ATL 7083
7955 ATL TPA 7076
8891 DFW ATL 7060
7839 ATL DFW 7056

Displaying the Chord Diagram

import holoviews as hv
hv.extension('bokeh')
/home/jungyeon/anaconda3/envs/imi/lib/python3.7/site-packages/requests/__init__.py:104: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (5.0.0)/charset_normalizer (2.0.12) doesn't match a supported version!
  RequestsDependencyWarning)

HoloViews는 셀의 출력 결과를 표시하는 방법을 수정하기 위해 %%opts 셀 매직을 사용합니다. Chord 클래스를 사용하여 코드 다이어그램을 표시합니다.

코드 다이어그램에서 각 원(노드로 불리는)은 공항을 나타냅니다. 공항 간의 관계를 보려면, 원 위에 마우스를 올리면 됩니다.

%%opts Chord [height=500 width=500 title="Flights between airports" ]
chord = hv.Chord(df_between_airports.head(top))
chord

출력된 코드 다이어그램에서는 목적지 공항이 무엇인지 알아볼 수 없는 것 같습니다. 따라서, 출발지와 도착지 공항의 목록을 가져와 hv.Dataset 객체를 생성하는 데 사용하겠습니다.

# get the top count of flights between airports
df_between_airports = df_between_airports.head(top)

# find all the unique origin and destination airports
airports = list(set(df_between_airports["ORIGIN_AIRPORT"].unique().tolist() + 
                    df_between_airports["DESTINATION_AIRPORT"].unique().tolist()))

airports_dataset = hv.Dataset(pd.DataFrame(airports, columns=["Airport"]))

각 노드에서 공항 이름을 표시하려면, %%opts 셀 매직에서 labels 속성을 설정하고 airports_dataset 변수를 Chord 클래스 초기화자로 전달하면 됩니다.

%%opts Chord [height=500 width=500 title="Flights between airports" labels="Airport"]
chord = hv.Chord((df_between_airports, airports_dataset))
chord

참고할 점은 df_between_airports와 airports_dataset 변수가 튜플로 묶여 있다는 것입니다.

(df_between_airports, airports_dataset)
(      ORIGIN_AIRPORT DESTINATION_AIRPORT  COUNT
 11867            SFO                 LAX  13744
 10165            LAX                 SFO  13457
 9949             JFK                 LAX  12016
 10130            LAX                 JFK  12015
 10053            LAS                 LAX   9715
 10253            LGA                 ORD   9639
 10132            LAX                 LAS   9594
 11085            ORD                 LGA   9575
 11864            SFO                 JFK   8440
 9974             JFK                 SFO   8437
 10925            OGG                 HNL   8313
 9536             HNL                 OGG   8282
 10149            LAX                 ORD   8256
 7887             ATL                 LGA   8234
 10204            LGA                 ATL   8215
 7892             ATL                 MCO   8202
 10372            MCO                 ATL   8202
 11866            SFO                 LAS   7995
 11082            ORD                 LAX   7941
 10083            LAS                 SFO   7870
 8989             DFW                 ORD   7870
 11797            SEA                 LAX   7765
 10164            LAX                 SEA   7732
 8168             BOS                 DCA   7687
 8693             DCA                 BOS   7686
 11044            ORD                 DFW   7677
 7854             ATL                 FLL   7419
 9337             FLL                 ATL   7411
 11342            PHX                 LAX   7380
 11887            SFO                 ORD   7380
 11138            ORD                 SFO   7378
 10153            LAX                 PHX   7330
 8851             DEN                 PHX   7211
 11322            PHX                 DEN   7204
 10208            LGA                 BOS   7100
 8182             BOS                 LGA   7096
 12220            TPA                 ATL   7083
 7955             ATL                 TPA   7076
 8891             DFW                 ATL   7060
 7839             ATL                 DFW   7056, :Dataset   [Airport])
type(airports_dataset)
holoviews.core.data.Dataset

Applying Colors to the Chord Diagram

  • https://docs.bokeh.org/en/latest/docs/reference/palettes.html

node_cmap은 노드에 적용할 팔레트를 나타내며, edge_color는 코드 다이어그램 엣지에 적용할 팔레트를 나타냅니다.

%%opts Chord [height=500 width=500 title="Flights between airports" labels="Airport"]
%%opts Chord (node_color="Airport" node_cmap="Category20" edge_color="ORIGIN_AIRPORT" edge_cmap='Category20')
chord = hv.Chord((df_between_airports, airports_dataset))
chord

다른 팔레트 적용

%%opts Chord [height=500 width=500 title="Flights between airports" labels="Airport"]
%%opts Chord (node_color="Airport" node_cmap="Category20" edge_color="ORIGIN_AIRPORT" edge_cmap='Bokeh')
chord = hv.Chord((df_between_airports, airports_dataset))
chord

Confirming the Relationships in the Chord Diagram

이 글의 마지막 섹션에서는 코드 다이어그램이 올바른 정보를 표시하는지 확인하고 싶습니다. JFK를 선택하면, 항공편이 LAX와 SFO로 날아가고 있다는 것을 알 수 있습니다.

df_between_airports.query('ORIGIN_AIRPORT == "JFK"')
ORIGIN_AIRPORT DESTINATION_AIRPORT COUNT
9949 JFK LAX 12016
9974 JFK SFO 8437
df_between_airports.query('ORIGIN_AIRPORT == "ORD"')
ORIGIN_AIRPORT DESTINATION_AIRPORT COUNT
11085 ORD LGA 9575
11082 ORD LAX 7941
11044 ORD DFW 7677
11138 ORD SFO 7378